Data Analysis Presented by:
¶

Average GPA for the entire subject of CS in Spring 2023: 2.5, which is a C+, with the hardest subject being CSCI 320 with a 1.9¶


Average Withdrawal Rate for any individual class for the entire subject of CS in Spring 2023 is 24.32% ¶


Standard Deviation in the context of the subject tells us about the SPREAD OF GRADES THAT STUDENTS RECEIVED IN EACH CLASS as a whole, which was 1.23. This can be can considered quite high and indicate that there are a significant number of both very high and very low grades, or it might mean that grades are distributed quite evenly across the range from low to high. It should also be noted that in this subject there is usually different grading systems in the form of a curve or a grade distrubution around the average score of a class.¶


Nonetheless, if the distribution of scores is approximately normal (i.e., follows a bell curve), the empirical rule states:¶

  • About 68% of the data falls within one standard deviation of the mean. Thus, 68% of scores would be within mean −1.23 to mean +1.23
  • About 95% falls within two standard deviations
  • About 99.7% falls within three standard deviations


Bringing it all together:¶

  • If the AVERAGE GPA of a math class is 2.5 and the AVERAGE STANDARD DEVIATION is 1.23, that means that on average, 68% of students who COMPLETE a CS class will get a GPA ranging from 1.3 to 3.7 .


IMPORTANT FUNCTIONS USED IN THIS NOTEBOOK:¶

  • gpa_letter_converter(), which takes in a gpa and converts it into a letter grade
  • calculate_average_gpas(), which automates the process of going through every class number and every professor for every class number
  • calculate_teacher_gpas(), which automates the process of going through every teacher in a subject irrespective of class number, getting their gpa, and comparing it to the average gpa for the entire subject

Note: These will be used for doing the rest of the analysis of subjects by class!
¶

SCROLL DOWN FOR DATA VISUALIZATIONS AND INDIVIDUAL STATS ON TEACHERS!¶

Also, check this website out to get some basics on formating text with Jupyter notebook!¶

  • https://www.earthdatascience.org/courses/intro-to-earth-data-science/file-formats/use-text-files/format-text-with-markdown-jupyter-notebook/
In [1]:
import pandas as pd
import numpy as np
url = "https://docs.google.com/spreadsheets/d/1mS6khEB6m8cPNenNvY9Tg6bJ6YkmcvCI/export?format=csv&gid=1327379612"
df = pd.read_csv(url)
df.head()
dict = {'TERM': ['S2023', .... ]
       'SUBJECT: ['ACCT', .... 'BIO'..... , '']}
Out[1]:
TERM SUBJECT NBR COURSE NAME SECTION PROF TOTAL A+ A A- ... B B- C+ C C- D F W INC/NA AVG GPA
0 S2023 AACS 107 Immigrant Communities Queens 1 KHANDELWAL, M 13 0 1 5 ... 1 0 0 0 0 0 0 1 0 3.643
1 S2023 ACCT 100 Fin & Mgr Acct 1 HO, V 20 0 5 3 ... 5 3 0 0 0 0 0 3 0 3.382
2 S2023 ACCT 101 Intro Thry & Prac of Acct I 11 CHAN, J 29 0 13 1 ... 5 0 2 0 0 1 0 4 0 3.448
3 S2023 ACCT 101 Intro Thry & Prac of Acct I 8 FEISULLIN, A 40 5 2 5 ... 3 3 8 2 0 0 2 5 1 2.918
4 S2023 ACCT 101 Intro Thry & Prac of Acct I 9 SUN, F 40 0 2 10 ... 5 0 1 1 3 5 2 6 1 2.634

5 rows × 21 columns

In [2]:
analysis_df = df.drop(columns = ['TERM','SECTION','INC/NA'])
analysis_df
Out[2]:
SUBJECT NBR COURSE NAME PROF TOTAL A+ A A- B+ B B- C+ C C- D F W AVG GPA
0 AACS 107 Immigrant Communities Queens KHANDELWAL, M 13 0 1 5 0 1 0 0 0 0 0 0 1 3.643
1 ACCT 100 Fin & Mgr Acct HO, V 20 0 5 3 1 5 3 0 0 0 0 0 3 3.382
2 ACCT 101 Intro Thry & Prac of Acct I CHAN, J 29 0 13 1 3 5 0 2 0 0 1 0 4 3.448
3 ACCT 101 Intro Thry & Prac of Acct I FEISULLIN, A 40 5 2 5 4 3 3 8 2 0 0 2 5 2.918
4 ACCT 101 Intro Thry & Prac of Acct I SUN, F 40 0 2 10 3 5 0 1 1 3 5 2 6 2.634
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2403 URBST 371W VT: Service Learning Project GOLDFISCHER, E 23 4 7 3 2 3 2 0 0 1 0 0 1 3.536
2404 URBST 373W Spec. Problems-Environ Studies CONSTANTINIDES, C 20 7 10 0 0 0 0 0 0 0 0 0 3 4.000
2405 WGS 101W Intro Women & Gender Studies GIARDINA, C 25 0 6 7 4 4 1 0 0 0 0 0 2 3.536
2406 WGS 101W Intro Women & Gender Studies GIARDINA, C 25 0 5 5 5 2 1 1 0 1 1 1 3 3.123
2407 WGS 201W Theories of Feminism CRANDALL, E 15 0 3 4 0 0 1 2 1 1 0 0 3 3.150

2408 rows × 18 columns

All Main Functions Below:¶

The isinstance() function returns True if the specified object is of the specified type, otherwise False (https://stackoverflow.com/questions/1549801/what-are-the-differences-between-type-and-isinstance)¶

In pandas, both unique and nunique are used to get unique values of a series object, but they serve different purposes and return different types of output:¶

  • unique(): This function returns an array of all unique values in the order that they appear in the original DataFrame or Series. It's useful when you want to see or use the actual unique values (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.unique.html)

  • nunique(): This function returns an integer that represents the number of unique values. It's useful when you just want to know how many unique values exist, rather than what those unique values are (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.nunique.html)

iloc[]:¶

  • Purely integer-location based indexing for selection by position (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html)

sort_values():¶

  • Sort by values along either axis (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html)

extend():¶

  • Adds the specified list elements (or any iterable) to the end of the current list https://www.w3schools.com/python/ref_list_extend.asp

np.std():¶

  • Compute the standard deviation along the specified axis https://numpy.org/doc/stable/reference/generated/numpy.std.html
In [3]:
def gpa_letter_converter(gpa):
    letter_grades = {
        "A": 4.0, 
        "A-": (3.7, 3.8, 3.9), 
        "B+": (3.3, 3.4, 3.5, 3.6), 
        "B": (3.0, 3.1, 3.2), 
        "B-": (2.7, 2.8, 2.9), 
        "C+": (2.3, 2.4, 2.5, 2.6),
        "C": (2.0, 2.1, 2.2), 
        "C-": (1.7, 1.8, 1.9),
        "D": (1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6),
        "F": (0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9)
    }
    for letter_grade, number_grade in letter_grades.items():
        if isinstance(number_grade, float) and gpa == number_grade:
            return letter_grade
        elif isinstance(number_grade, tuple) and gpa in number_grade:
            return letter_grade
    
    return None

def calculate_average_gpas(df):
    # Prepare a list to store the results
    results = []

    # GPA equivalents for each letter grade
    letter_grades_to_gpa = {
        "A+": 4.0,
        "A": 4.0,
        "A-": 3.7,
        "B+": 3.3,
        "B": 3.0,
        "B-": 2.7,
        "C+": 2.3,
        "C": 2.0,
        "C-": 1.7,
        "D": 1.0,
        "F": 0.0
    }

    # Loop over all unique course numbers
    for class_nbr in df["NBR"].unique():
        # Filter the DataFrame for the current course number
        df_nbr = df[df["NBR"] == class_nbr]

        # Loop over all unique professors for the current course number
        for prof in df_nbr["PROF"].unique():
            # Filter the DataFrame for the current professor
            df_prof = df_nbr[df_nbr["PROF"] == prof]

            # Calculate the average GPA for the current professor and course number
            avg_gpa_prof = round(df_prof["AVG GPA"].mean(), 1)

            # Convert the individual grade counts to GPA equivalents and calculate the standard deviation
            gpa_distributions = []
            for grade_letter, gpa in letter_grades_to_gpa.items():
                gpa_distributions.extend([gpa] * df_prof[grade_letter].sum())
            std_dev_gpa_prof = round(np.std(gpa_distributions), 2)

            # Append the result to the list
            results.append({
                "CLASS NUMBER": class_nbr,
                "PROF": prof,
                "AVG GPA PROF": avg_gpa_prof,
                "AVG GPA PROF LETTER": gpa_letter_converter(avg_gpa_prof),
                "STD DEV GPA PROF": std_dev_gpa_prof
            })

        # If there is more than one professor for this course, calculate the average GPA for the current course number, regardless of the professor
        if df_nbr["PROF"].nunique() > 1:
            avg_gpa_nbr = round(df_nbr["AVG GPA"].mean(), 1)

            # Convert the individual grade counts to GPA equivalents and calculate the standard deviation
            gpa_distributions = []
            for grade_letter, gpa in letter_grades_to_gpa.items():
                gpa_distributions.extend([gpa] * df_nbr[grade_letter].sum())
            std_dev_gpa_nbr = round(np.std(gpa_distributions), 2)

            # Append the result to the list
            results.append({
                "CLASS NUMBER": class_nbr,
                "PROF": "All Professors",
                "AVG GPA PROF": avg_gpa_nbr,
                "AVG GPA PROF LETTER": gpa_letter_converter(avg_gpa_nbr),
                "STD DEV GPA PROF": std_dev_gpa_nbr
            })

    # Convert the list of results to a DataFrame
    df_results = pd.DataFrame(results)
    
    # Find the hardest class based on average GPA
    hardest_class = df_results[df_results["PROF"] == "All Professors"].sort_values("AVG GPA PROF").iloc[0]

    # Calculate the average GPA for the entire subject
    avg_gpa_subject = round(df["AVG GPA"].mean(), 1)

    print(f"Average GPA for this entire subject in Spring 2023 is: {avg_gpa_subject}, which is a {gpa_letter_converter(avg_gpa_subject)}")
    print(f"The hardest class based on average GPA in Spring 2023 is class number: {hardest_class['CLASS NUMBER']} with an average GPA of {hardest_class['AVG GPA PROF']}, which is a {hardest_class['AVG GPA PROF LETTER']}")
    print("\nStandard deviation tells us about the spread of the grades that students received in each class. A higher standard deviation indicates a wider range of grades, while a lower standard deviation indicates that grades were more closely clustered around the average.")

    return df_results

def calculate_teacher_gpas(df):
    # Prepare a list to store the results
    results = []

    # GPA equivalents for each letter grade
    letter_grades_to_gpa = {
        "A+": 4.0,
        "A": 4.0,
        "A-": 3.7,
        "B+": 3.3,
        "B": 3.0,
        "B-": 2.7,
        "C+": 2.3,
        "C": 2.0,
        "C-": 1.7,
        "D": 1.0,
        "F": 0.0
    }

    # Loop over all unique professors
    for prof in sorted(df["PROF"].unique()):
        # Filter the DataFrame for the current professor
        df_prof = df[df["PROF"] == prof]

        # Calculate the average GPA for the current professor
        avg_gpa_prof = round(df_prof["AVG GPA"].mean(), 1)

        # Convert the individual grade counts to GPA equivalents and calculate the standard deviation
        gpa_distributions = []
        for grade_letter, gpa in letter_grades_to_gpa.items():
            gpa_distributions.extend([gpa] * df_prof[grade_letter].sum())
        std_dev_gpa_prof = round(np.std(gpa_distributions), 1)

        # Calculate the percentage of students who withdrew for the current professor
        withdraw_percentage = df_prof["W"].sum() / df_prof["TOTAL"].sum()

        # Append the result to the list
        results.append({
            "PROF": prof,
            "AVG GPA PROF": avg_gpa_prof,
            "AVG GPA PROF LETTER": gpa_letter_converter(avg_gpa_prof),
            "STD DEV GPA PROF": std_dev_gpa_prof,
            "NUM OF CLASSES": len(df_prof),
            "WITHDRAW PERCENTAGE": round(withdraw_percentage * 100, 1)
        })

    # Convert the list of results to a DataFrame
    df_results = pd.DataFrame(results)

    # Calculate the average GPA for the entire subject
    avg_gpa_subject = round(df["AVG GPA"].mean(), 1)

    # Calculate the average standard deviation for the entire subject
    gpa_distributions = []
    for grade_letter, gpa in letter_grades_to_gpa.items():
        gpa_distributions.extend([gpa] * df[grade_letter].sum())
    avg_std_dev_subject = round(np.std(gpa_distributions), 1)
    
    # Calculate the average withdrawal percentage for the entire subject
    withdraw_percentage_subject = round((df["W"].sum() / df["TOTAL"].sum()) * 100, 2)
    
    # Find the professors who teach more than one class, have a GPA that is the same or higher than the subject average GPA,
    # and for whom less than 40% of students withdrew
    best_profs = df_results[(df_results["NUM OF CLASSES"] > 1) & (df_results["AVG GPA PROF"] >= avg_gpa_subject) & (df_results["WITHDRAW PERCENTAGE"] <= withdraw_percentage_subject)]

    print(f"Professors who teach more than one class, have a GPA that is the same or higher than the subject average GPA ({avg_gpa_subject}), and have a withdrawal percentage that is less or equal than the subject average withdrawal percentage ({withdraw_percentage_subject}%):\n")
    print(best_profs, "\n")
    print("Note: This ignores rate my professor ratings. This is soley based on GPA and it doesn't consider how much students actually learn from these teachers!")
    print(f"Average standard deviation of GPA for all teachers in this subject in Spring 2023 is: {avg_std_dev_subject}")

    return df_results

dropna():¶

  • Remove missing values (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html)
In [4]:
analysis_df = analysis_df.dropna(subset = "PROF")
analysis_df = analysis_df[analysis_df["AVG GPA"] != 0]
cs_df = analysis_df[analysis_df["SUBJECT"] == "CSCI"]
pd.set_option('display.max_rows', None)
cs_df
Out[4]:
SUBJECT NBR COURSE NAME PROF TOTAL A+ A A- B+ B B- C+ C C- D F W AVG GPA
566 CSCI 12 Intro Computers & Computation HUANG, X 38 1 3 4 2 6 4 4 3 0 1 0 10 2.943
568 CSCI 48 Spreadsheet Programming FRIED, M 94 12 16 7 12 11 4 5 4 5 7 3 8 2.980
569 CSCI 48 Spreadsheet Programming JAGDEO, M 78 5 12 10 10 7 7 2 4 3 4 8 6 2.772
576 CSCI 85 Database App Programming HILL, D 34 1 4 3 4 1 2 3 0 1 4 5 4 2.332
577 CSCI 90 Topics in Computing CONNOR, T 13 2 11 0 0 0 0 0 0 0 0 0 0 4.000
578 CSCI 111 Intro Algorithmic Problem Solv TSE, C 136 0 9 7 2 5 16 13 8 0 0 15 57 2.301
579 CSCI 111 Intro Algorithmic Problem Solv CHYN, X 163 2 12 4 8 15 9 9 13 3 3 21 62 2.235
580 CSCI 111 Intro Algorithmic Problem Solv CHYN, X 236 1 7 7 11 21 12 22 10 15 19 29 82 1.979
599 CSCI 211 Object-Oriented Program in C++ WAXMAN, J 143 13 8 9 3 16 17 9 16 0 8 21 22 2.348
600 CSCI 211 Object-Oriented Program in C++ WAXMAN, J 211 18 7 6 10 17 30 8 28 0 18 33 36 2.169
601 CSCI 211 Object-Oriented Program in C++ ALAYEV, Y 12 0 0 0 1 0 0 1 0 2 3 1 4 1.500
617 CSCI 212 Object-Oriented Prog in Java LORD, K 150 11 11 10 11 16 6 11 8 6 9 11 40 2.600
618 CSCI 212 Object-Oriented Prog in Java STEINBERG, O 85 1 7 8 10 1 4 5 4 1 3 11 29 2.411
619 CSCI 212 Object-Oriented Prog in Java LORD, K 165 3 11 10 12 13 12 9 13 7 11 21 43 2.243
634 CSCI 220 Discrete Structures KAHROBAEI, D 35 4 10 6 1 3 1 0 0 2 0 0 7 3.578
635 CSCI 220 Discrete Structures KAHROBAEI, D 35 4 11 3 4 4 0 1 1 0 0 1 6 3.469
636 CSCI 220 Discrete Structures TAO, X 30 0 2 5 2 4 7 2 2 2 1 0 3 2.852
637 CSCI 220 Discrete Structures KONG, T 22 3 3 1 2 1 1 0 1 0 4 3 2 2.421
638 CSCI 220 Discrete Structures GRYAK, J 35 1 1 1 0 4 6 11 3 0 0 3 5 2.373
639 CSCI 220 Discrete Structures GRYAK, J 27 2 1 0 0 1 5 2 4 0 0 5 7 2.055
640 CSCI 220 Discrete Structures KONG, T 23 0 1 1 0 0 0 4 3 0 0 3 10 1.908
641 CSCI 240 Computer Org & Assembly Lang YEH, J 40 3 7 9 1 2 3 4 2 1 0 4 4 2.933
642 CSCI 240 Computer Org & Assembly Lang YEH, J 40 3 1 5 5 7 1 3 3 3 0 3 6 2.726
643 CSCI 240 Computer Org & Assembly Lang YEH, J 40 2 5 6 1 5 4 3 2 3 0 4 5 2.723
644 CSCI 240 Computer Org & Assembly Lang TAO, X 32 0 3 2 5 6 2 2 6 2 0 2 2 2.643
645 CSCI 240 Computer Org & Assembly Lang TAO, X 32 0 1 1 3 5 4 3 6 1 0 2 6 2.462
646 CSCI 240 Computer Org & Assembly Lang FLUTURE, S 64 1 3 1 0 1 4 8 2 0 0 10 34 1.863
647 CSCI 313 Data Structures TAO, X 32 0 3 3 4 7 1 4 3 0 0 6 1 2.426
648 CSCI 313 Data Structures MITCHELL, T 20 0 1 1 0 2 2 2 2 3 1 0 6 2.414
649 CSCI 313 Data Structures STEINBERG, O 32 1 0 2 3 5 8 2 4 0 0 5 2 2.350
650 CSCI 313 Data Structures MITCHELL, T 27 1 1 3 2 0 5 0 1 4 1 3 6 2.333
651 CSCI 313 Data Structures MITCHELL, T 25 1 2 1 3 1 2 1 1 1 0 5 7 2.222
652 CSCI 313 Data Structures SVADLENKA, J 14 2 0 0 1 0 1 0 2 0 1 2 5 2.111
653 CSCI 313 Data Structures STEINBERG, O 32 1 6 4 2 0 2 1 3 0 0 11 2 2.103
654 CSCI 316 Principles of Programming Lang SMITH-THOMPSON, A 33 3 1 3 0 3 3 6 5 0 2 1 5 2.593
655 CSCI 316 Principles of Programming Lang KONG, T 51 1 4 0 2 6 10 4 3 0 1 10 10 2.141
656 CSCI 316 Principles of Programming Lang SMITH-THOMPSON, A 33 0 0 0 1 5 1 5 12 1 3 2 3 2.040
657 CSCI 316 Principles of Programming Lang SMITH-THOMPSON, A 33 0 1 2 1 3 0 7 6 0 4 5 2 1.924
658 CSCI 316 Principles of Programming Lang SVITAK, J 31 0 1 1 1 1 2 0 2 3 8 2 10 1.738
659 CSCI 316 Principles of Programming Lang SVITAK, J 53 0 0 0 1 1 2 3 5 5 18 4 13 1.413
660 CSCI 320 Theory of Computation OBRENIC, B 79 11 6 3 4 15 4 6 7 0 1 10 12 2.640
661 CSCI 320 Theory of Computation OBRENIC, B 108 6 7 3 1 15 10 6 21 0 2 11 26 2.393
662 CSCI 320 Theory of Computation BOKLAN, K 44 0 1 0 0 0 0 0 4 0 0 10 29 0.800
663 CSCI 323 Design & Analysis Algorithms KAHROBAEI, D 35 6 14 3 1 2 1 0 2 0 0 0 6 3.693
664 CSCI 323 Design & Analysis Algorithms PHILLIPS, T 27 3 1 1 3 4 0 0 2 0 0 4 9 2.533
665 CSCI 323 Design & Analysis Algorithms PHILLIPS, T 26 3 2 2 2 2 1 0 1 0 0 5 8 2.483
666 CSCI 323 Design & Analysis Algorithms CHYN, X 35 0 1 3 3 5 3 2 5 0 0 5 8 2.322
667 CSCI 323 Design & Analysis Algorithms BROWN, T 30 1 1 1 0 2 1 1 8 0 0 9 6 1.613
668 CSCI 331 Database Systems SY, B 24 0 1 1 2 3 2 0 1 2 0 0 12 2.842
669 CSCI 331 Database Systems OBRENIC, B 60 3 1 2 4 6 5 6 13 0 0 3 17 2.509
670 CSCI 331 Database Systems CHYN, X 31 1 3 2 1 5 3 4 4 0 0 4 4 2.481
671 CSCI 331 Database Systems LEAVITT, D 36 0 3 2 1 7 1 4 9 0 4 1 4 2.425
672 CSCI 331 Database Systems HELLER, P 38 0 2 0 4 1 2 3 6 0 0 3 17 2.310
673 CSCI 335 Information Org and Retrieval GOLDBERG, R 32 0 3 2 3 9 6 4 2 0 1 0 2 2.890
674 CSCI 340 Operating Systems Principles SVADLENKA, J 37 1 3 4 3 3 1 1 7 0 1 1 11 2.788
675 CSCI 340 Operating Systems Principles FLUTURE, S 35 1 2 1 1 2 5 3 6 0 0 0 13 2.733
676 CSCI 340 Operating Systems Principles SMITH-THOMPSON, A 35 3 2 1 0 6 7 2 8 0 0 1 4 2.707
677 CSCI 340 Operating Systems Principles SVADLENKA, J 27 2 0 2 2 1 2 1 2 0 3 0 12 2.647
678 CSCI 340 Operating Systems Principles FLUTURE, S 35 1 0 3 0 1 3 6 4 0 0 2 14 2.400
679 CSCI 340 Operating Systems Principles SVADLENKA, J 31 1 2 1 1 1 0 1 2 0 1 4 15 2.093
680 CSCI 340 Operating Systems Principles RAHMAN, M 37 0 0 1 0 1 6 5 13 0 0 6 5 1.888
681 CSCI 343 Computer Architecture FLUTURE, S 33 4 5 1 3 0 0 3 2 0 0 0 15 3.361
682 CSCI 343 Computer Architecture UPADHYAY, V 35 3 5 1 1 4 3 2 7 0 0 1 8 2.878
683 CSCI 343 Computer Architecture FLUTURE, S 42 1 1 3 4 3 4 3 2 0 0 1 20 2.864
684 CSCI 343 Computer Architecture UPADHYAY, V 34 1 6 0 1 6 2 2 4 0 0 4 8 2.588
685 CSCI 343 Computer Architecture SVITAK, J 35 0 1 1 0 2 3 2 7 4 6 2 7 1.900
686 CSCI 343 Computer Architecture SVITAK, J 34 0 1 0 1 4 0 2 4 2 9 1 10 1.846
687 CSCI 348 Data Communications RAHMAN, M 32 0 1 1 2 2 1 8 6 0 0 1 10 2.427
688 CSCI 352 Cryptography BOKLAN, K 17 0 0 0 0 2 1 0 1 0 0 1 12 2.140
689 CSCI 355 Internet and Web Technologies FRIED, M 40 0 9 2 4 6 4 4 3 2 3 1 2 2.816
690 CSCI 355 Internet and Web Technologies LAW, R 31 1 2 5 2 1 2 3 3 0 0 4 8 2.539
691 CSCI 355 Internet and Web Technologies LAW, R 36 2 3 1 1 1 3 1 1 0 3 6 14 2.064
692 CSCI 355 Internet and Web Technologies LAW, R 31 2 2 1 2 0 1 2 4 0 0 11 6 1.664
693 CSCI 370 Software Engineering GOLDBERG, R 33 0 5 3 3 5 2 6 1 0 0 1 7 2.969
694 CSCI 370 Software Engineering GREENBERG, A 34 1 5 3 4 8 1 6 5 0 0 1 0 2.906
695 CSCI 370 Software Engineering GREENBERG, A 32 0 4 2 7 4 3 6 3 0 0 1 1 2.880
696 CSCI 370 Software Engineering TSAI, C 16 0 0 1 2 0 0 1 2 0 0 0 10 2.767
697 CSCI 370 Software Engineering GREENBERG, A 32 0 3 4 6 5 2 2 3 0 0 4 2 2.676
698 CSCI 370 Software Engineering ABREU, A 35 0 0 2 4 6 5 3 5 0 1 5 4 2.258
699 CSCI 370 Software Engineering ABREU, A 35 0 0 0 3 2 4 5 7 1 5 3 5 1.963
700 CSCI 381 VT: Special Topics in Comp Sci RODAY, R 36 6 11 10 5 3 1 0 0 0 0 0 0 3.700
701 CSCI 381 VT: Special Topics in Comp Sci LI, J 32 0 5 5 7 4 4 2 2 0 0 1 2 3.100
702 CSCI 381 VT: Special Topics in Comp Sci GOLDBERG, R 35 0 3 3 9 3 9 4 1 0 0 1 2 2.948
703 CSCI 381 VT: Special Topics in Comp Sci PHILLIPS, T 26 3 3 2 2 2 3 2 2 0 0 2 5 2.890
704 CSCI 381 VT: Special Topics in Comp Sci ROZOVSKAYA, A 31 1 1 1 2 2 1 2 2 0 0 1 18 2.738
705 CSCI 381 VT: Special Topics in Comp Sci TSAI, C 16 0 1 4 0 1 0 1 3 0 0 1 5 2.736
706 CSCI 381 VT: Special Topics in Comp Sci CESARETTI, P 20 0 0 2 4 1 1 4 4 0 0 0 4 2.719
707 CSCI 381 VT: Special Topics in Comp Sci WAXMAN, J 40 5 1 4 4 1 1 0 5 0 6 6 7 2.233
708 CSCI 381 VT: Special Topics in Comp Sci ROZOVSKAYA, A 35 0 2 0 1 2 3 3 2 0 0 6 16 1.911
709 CSCI 381 VT: Special Topics in Comp Sci BROWN, T 21 0 0 0 1 3 0 0 6 0 0 6 5 1.519
710 CSCI 390 Honors Readings in Comp Sci GOSWAMI, M 14 3 8 1 1 0 0 0 0 0 0 0 1 3.923
In [5]:
cs_df_results = calculate_average_gpas(cs_df)
cs_df_results
Average GPA for this entire subject in Spring 2023 is: 2.5, which is a C+
The hardest class based on average GPA in Spring 2023 is class number: 320 with an average GPA of 1.9, which is a C-

Standard deviation tells us about the spread of the grades that students received in each class. A higher standard deviation indicates a wider range of grades, while a lower standard deviation indicates that grades were more closely clustered around the average.
Out[5]:
CLASS NUMBER PROF AVG GPA PROF AVG GPA PROF LETTER STD DEV GPA PROF
0 12 HUANG, X 2.9 B- 0.74
1 48 FRIED, M 3.0 B 1.09
2 48 JAGDEO, M 2.8 B- 1.28
3 48 All Professors 2.9 B- 1.19
4 85 HILL, D 2.3 C+ 1.45
5 90 CONNOR, T 4.0 A 0.00
6 111 TSE, C 2.3 C+ 1.30
7 111 CHYN, X 2.1 C 1.29
8 111 All Professors 2.2 C 1.30
9 211 WAXMAN, J 2.3 C+ 1.34
10 211 ALAYEV, Y 1.5 D 0.93
11 211 All Professors 2.0 C 1.34
12 212 LORD, K 2.4 C+ 1.30
13 212 STEINBERG, O 2.4 C+ 1.43
14 212 All Professors 2.4 C+ 1.32
15 220 KAHROBAEI, D 3.5 B+ 0.76
16 220 TAO, X 2.9 B- 0.75
17 220 KONG, T 2.2 C 1.45
18 220 GRYAK, J 2.2 C 1.13
19 220 All Professors 2.7 B- 1.19
20 240 YEH, J 2.8 B- 1.20
21 240 TAO, X 2.6 C+ 0.96
22 240 FLUTURE, S 1.9 C- 1.44
23 240 All Professors 2.6 C+ 1.22
24 313 TAO, X 2.4 C+ 1.32
25 313 MITCHELL, T 2.3 C+ 1.26
26 313 STEINBERG, O 2.2 C 1.47
27 313 SVADLENKA, J 2.1 C 1.46
28 313 All Professors 2.3 C+ 1.37
29 316 SMITH-THOMPSON, A 2.2 C 1.03
30 316 KONG, T 2.1 C 1.35
31 316 SVITAK, J 1.6 D 0.93
32 316 All Professors 2.0 C 1.12
33 320 OBRENIC, B 2.5 C+ 1.25
34 320 BOKLAN, K 0.8 F 1.22
35 320 All Professors 1.9 C- 1.34
36 323 KAHROBAEI, D 3.7 A- 0.58
37 323 PHILLIPS, T 2.5 C+ 1.55
38 323 CHYN, X 2.3 C+ 1.24
39 323 BROWN, T 1.6 D 1.38
40 323 All Professors 2.5 C+ 1.45
41 331 SY, B 2.8 B- 0.70
42 331 OBRENIC, B 2.5 C+ 0.94
43 331 CHYN, X 2.5 C+ 1.22
44 331 LEAVITT, D 2.4 C+ 0.96
45 331 HELLER, P 2.3 C+ 1.13
46 331 All Professors 2.5 C+ 1.03
47 335 GOLDBERG, R 2.9 B- 0.64
48 340 SVADLENKA, J 2.5 C+ 1.22
49 340 FLUTURE, S 2.6 C+ 0.88
50 340 SMITH-THOMPSON, A 2.7 B- 0.85
51 340 RAHMAN, M 1.9 C- 0.98
52 340 All Professors 2.5 C+ 1.07
53 343 FLUTURE, S 3.1 B 0.86
54 343 UPADHYAY, V 2.7 B- 1.15
55 343 SVITAK, J 1.9 C- 0.95
56 343 All Professors 2.6 C+ 1.13
57 348 RAHMAN, M 2.4 C+ 0.78
58 352 BOKLAN, K 2.1 C 1.13
59 355 FRIED, M 2.8 B- 1.01
60 355 LAW, R 2.1 C 1.55
61 355 All Professors 2.3 C+ 1.43
62 370 GOLDBERG, R 3.0 B 0.87
63 370 GREENBERG, A 2.8 B- 0.98
64 370 TSAI, C 2.8 B- 0.69
65 370 ABREU, A 2.1 C 1.06
66 370 All Professors 2.6 C+ 1.04
67 381 RODAY, R 3.7 A- 0.37
68 381 LI, J 3.1 B 0.82
69 381 GOLDBERG, R 2.9 B- 0.74
70 381 PHILLIPS, T 2.9 B- 1.16
71 381 ROZOVSKAYA, A 2.3 C+ 1.33
72 381 TSAI, C 2.7 B- 1.16
73 381 CESARETTI, P 2.7 B- 0.62
74 381 WAXMAN, J 2.2 C 1.48
75 381 BROWN, T 1.5 D 1.25
76 381 All Professors 2.6 C+ 1.19
77 390 GOSWAMI, M 3.9 A- 0.20
In [6]:
import matplotlib.pyplot as plt
import seaborn as sns

plt.figure()
all_prof_df_results = cs_df_results[cs_df_results["PROF"] == 'All Professors']
sns.histplot(all_prof_df_results["STD DEV GPA PROF"], kde=True, color='skyblue')
plt.axvline(all_prof_df_results["STD DEV GPA PROF"].mean(), color='red', linestyle='dashed', linewidth=1, label='Mean')
min_ylim, max_ylim = plt.ylim()
plt.text(all_prof_df_results["STD DEV GPA PROF"].mean()*1.12, max_ylim*0.9, 'Mean: {:.2f}'.format(all_prof_df_results["STD DEV GPA PROF"].mean()))

# Generate a color palette with as many colors as classes
colors = sns.color_palette("husl", len(all_prof_df_results))

for idx, (_, row) in enumerate(all_prof_df_results.iterrows()):
    plt.axvline(row["STD DEV GPA PROF"], color=colors[idx], linestyle='dotted', linewidth=0.5, label=row["CLASS NUMBER"])

plt.title("Distribution of Standard Deviations of GPAs")
plt.xlabel("Standard Deviation")
plt.ylabel("Frequency")
plt.legend(loc='upper right', bbox_to_anchor=(1.3, 1))  # Adjusted the location to ensure it doesn't overlap with the plot
plt.show()
In [7]:
teacher_cs_df_results = calculate_teacher_gpas(cs_df)
teacher_cs_df_results
Professors who teach more than one class, have a GPA that is the same or higher than the subject average GPA (2.5), and have a withdrawal percentage that is less or equal than the subject average withdrawal percentage (24.32%):

            PROF  AVG GPA PROF AVG GPA PROF LETTER  STD DEV GPA PROF  \
8       FRIED, M           2.9                  B-               1.1   
9    GOLDBERG, R           2.9                  B-               0.8   
11  GREENBERG, A           2.8                  B-               1.0   
17  KAHROBAEI, D           3.6                  B+               0.7   
24    OBRENIC, B           2.5                  C+               1.2   
34        TAO, X           2.6                  C+               1.0   
37   UPADHYAY, V           2.7                  B-               1.2   
39        YEH, J           2.8                  B-               1.2   

    NUM OF CLASSES  WITHDRAW PERCENTAGE  
8                2                  7.5  
9                3                 11.0  
11               3                  3.1  
17               3                 18.1  
24               3                 22.3  
34               4                  9.5  
37               2                 23.2  
39               3                 12.5   

Note: This ignores rate my professor ratings. This is soley based on GPA and it doesn't consider how much students actually learn from these teachers!
Average standard deviation of GPA for all teachers in this subject in Spring 2023 is: 1.3
Out[7]:
PROF AVG GPA PROF AVG GPA PROF LETTER STD DEV GPA PROF NUM OF CLASSES WITHDRAW PERCENTAGE
0 ABREU, A 2.1 C 1.1 2 12.9
1 ALAYEV, Y 1.5 D 0.9 1 33.3
2 BOKLAN, K 1.5 D 1.3 2 67.2
3 BROWN, T 1.6 D 1.3 2 21.6
4 CESARETTI, P 2.7 B- 0.6 1 20.0
5 CHYN, X 2.3 C+ 1.3 4 33.5
6 CONNOR, T 4.0 A 0.0 1 0.0
7 FLUTURE, S 2.6 C+ 1.2 5 45.9
8 FRIED, M 2.9 B- 1.1 2 7.5
9 GOLDBERG, R 2.9 B- 0.8 3 11.0
10 GOSWAMI, M 3.9 A- 0.2 1 7.1
11 GREENBERG, A 2.8 B- 1.0 3 3.1
12 GRYAK, J 2.2 C 1.1 2 19.4
13 HELLER, P 2.3 C+ 1.1 1 44.7
14 HILL, D 2.3 C+ 1.4 1 11.8
15 HUANG, X 2.9 B- 0.7 1 26.3
16 JAGDEO, M 2.8 B- 1.3 1 7.7
17 KAHROBAEI, D 3.6 B+ 0.7 3 18.1
18 KONG, T 2.2 C 1.4 3 22.9
19 LAW, R 2.1 C 1.6 3 28.6
20 LEAVITT, D 2.4 C+ 1.0 1 11.1
21 LI, J 3.1 B 0.8 1 6.2
22 LORD, K 2.4 C+ 1.3 2 26.3
23 MITCHELL, T 2.3 C+ 1.3 3 26.4
24 OBRENIC, B 2.5 C+ 1.2 3 22.3
25 PHILLIPS, T 2.6 C+ 1.4 3 27.8
26 RAHMAN, M 2.2 C 0.9 2 21.7
27 RODAY, R 3.7 A- 0.4 1 0.0
28 ROZOVSKAYA, A 2.3 C+ 1.3 2 51.5
29 SMITH-THOMPSON, A 2.3 C+ 1.0 4 10.4
30 STEINBERG, O 2.3 C+ 1.5 3 22.1
31 SVADLENKA, J 2.4 C+ 1.3 4 39.4
32 SVITAK, J 1.7 C- 1.0 4 26.1
33 SY, B 2.8 B- 0.7 1 50.0
34 TAO, X 2.6 C+ 1.0 4 9.5
35 TSAI, C 2.8 B- 1.0 2 46.9
36 TSE, C 2.3 C+ 1.3 1 41.9
37 UPADHYAY, V 2.7 B- 1.2 2 23.2
38 WAXMAN, J 2.2 C 1.4 3 16.5
39 YEH, J 2.8 B- 1.2 3 12.5

Note¶

To actually interact with plot below and be able to hover over plots and see which teacher its referring to, download the html version of this file instead¶

In [8]:
import plotly.express as px

# Set the overall average GPA
mean_gpa = 2.5

# Create the scatter plot using Plotly Express
fig = px.scatter(teacher_cs_df_results, 
                 x=list(range(len(teacher_cs_df_results))),
                 y='AVG GPA PROF',
                 hover_name='PROF', # This will show the professor's name when hovering over a point
                 title="CS Professor GPA Averages vs CS Subject Average GPA",
                 labels={'x': 'Professor (By Index Above)', 'y': 'Average GPA'},
                 size_max=100)

# Add a line for the average GPA
fig.add_shape(
    type='line',
    line=dict(dash='dash', color='red'),
    x0=0,
    x1=len(teacher_cs_df_results),
    y0=mean_gpa,
    y1=mean_gpa,
)

# Show the plot
fig.show()

Teacher Analysis¶

Most of the teachers above make sense based off their Rate My Professor:¶

  • Michael Fried (https://www.ratemyprofessors.com/professor/2072193)
  • Aryeh Greenberg (https://www.ratemyprofessors.com/professor/2107674)
  • Delaram Kahrobaei (https://www.ratemyprofessors.com/professor/2870283)
  • Xiaopeng Tao (https://www.ratemyprofessors.com/professor/2806515)
  • Vivek Upadhyay (https://www.ratemyprofessors.com/professor/1116392)
  • Jackson Yeh (https://www.ratemyprofessors.com/professor/1082596)

Other's don't:¶

  • Robert Goldberg (https://www.ratemyprofessors.com/professor/446485)
  • Bojana Obrenic ((https://www.ratemyprofessors.com/professor/249702)

Teachers that missed the mark because of gpa, withdrawal rate, and/or only teaching one class, but they have a rate my professor of 4 and above:¶

  • Xinying Chyn (https://www.ratemyprofessors.com/professor/2715711)
  • Daniel Leavitt (https://www.ratemyprofessors.com/professor/855541)
  • Robert Roday (https://www.ratemyprofessors.com/professor/2722177)
  • Oren Steinberg (https://www.ratemyprofessors.com/professor/2698138)
In [9]:
# Count the number of classes with an average GPA at or greater than 3.0 and those less than 3.0
green_percentage = teacher_cs_df_results[teacher_cs_df_results['AVG GPA PROF'] >= 3.0].shape[0]
red_percentage = teacher_cs_df_results[teacher_cs_df_results['AVG GPA PROF'] < 3.0].shape[0]

# Create the values and labels for the pie chart
values = [green_percentage, red_percentage]
labels = ['At or above 3.0 (B or above)', 'At or below 2.7 (B- or below)']

# Define the colors for each section (green and red)
colors = ['#77dd77', '#ff6961']

# Plot the pie chart
plt.figure(figsize = (6, 6))
plt.pie(values, labels = labels, colors = colors, autopct = '%1.1f%%')

# Set the title
plt.title("Average GPA of all CS Teachers")

# Show the plot
plt.show()

The hardest class based on average GPA is class number: 320 with an average GPA of 1.9, which is a C-¶